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In complex networks it is common to model a network or generate a surrogate network based 
on the conservation of the network's degree distribution. We provide an alternative network model 
based on the conservation of connection density within a set of nodes. This density is measure by the 
rich-club coefficient. We present a method to generate surrogates networks with a given rich-club 
coefficient. We show that by choosing a suitable local linking term, the generated random networks 
can reproduce the degree distribution and the mixing pattern of real networks. The method is easy 
to implement and produces good models of real networks. 



* r . j . mondr agon@eecs . qmul . ac . uk] 
t iS.Zhou@cs.ucl.ac.uk1 



2 



I. INTRODUCTION 



Many complex networks are single networks in the sense that their structure is unique not one of several. In 
general, we do not have the equivalent of a physical law to verify if the statistical measures obtained from a single 
network are expected or exceptional. Instead, a common technique to analyze the properties of a single network is to 
use statistical randomization methods [l[ to create a reference network which is used for comparison purposes. The 
procedure consists of using the observed network to generate an ensemble of networks via randomization, usually the 
reshuffling of the links. The reference network is generated from this ensemble and it is used to assess the significance 
of a property of the network. To take into consideration the intrinsic structure of the network it is common to use a 
restricted randomization procedure. The restriction is to reshuffle the connections of the nodes without changing the 
degree distribution of the nodes P{k) ^2j, where P{k) is the degree distribution, that is, the fraction of nodes in the 
network with degree k. 

Many complex networks contain a small set of nodes that have large numbers of links, the so-called rich nodes. 
In some networks the rich nodes tend to be tightly interconnected between themselves, forming a rich-club Q. The 
rich-club is an oligarchy in that it dominates the organization of the whole network. In scale-free networks [3| 
the connectivity of the rich-club plays an important role in the functionality of the network, for example in the 
transmission of rumors in social networks [s*] or the efficient delivery of information in the Internet Q . The rich-club 
coefficient measures the density of connection among these group of nodes. This set of nodes can be defined by a 
ranking scheme Q , by their degree Q or by a network hierarchy . If the nodes are ranked by non-increasing order 
of their degree, i.e. the best-connected node is ranked as r = 1, the second best-connected node is r = 2 and so on, 
then the density of connections between the first r nodes is quantified by the rich-club coefficient [Sj] 



r(r — Ij 

where E{r) is the number of links between the r nodes and r{r — l)/2 is the maximum number of links that these 
nodes can share. As a function of the nodes degree, the rich-club coefficient is 



where Nk is the number of nodes with degree equal or higher than k and Ek is the number of links among these 
nodes. It is known that the rich-club coefficient (f>{k) is a projection of the degree-degree correlation Q, is independent 
of its degree distribution Q and it is non-trivially related to other properties like the clustering coefficient |10l] . 

It is possible to construct a surrogate network where the degree distribution P{k) is conserved [l,[iH, the rich-club 
coefficient (j){k) is conserved or both of these properties are conserved 0. Our aim here, is to build a network model 
and surrogate random networks based on the rank rich-club coefficient <I>(r) and show that they are good models of 
scale-free networks as they tend to reproduce key statistical properties. 

Before introducing the model, we would like to remark that the shape of the function <I>(r) depends on the way that 
the nodes are ranked. There is an ambiguity when labeling the nodes via a degree-dependent rank. For high degree 
nodes this is not a problem as the degree tend to be unique so the rank labels these nodes unambiguously. For lower 
degree nodes, there are many nodes with the same degree. In this case the labeling of the nodes is not unique. It is 
possible to reduce greatly this redundancy of the labels by using a linear order relationship [12], where the degree of 
the second neighbors is used to disambiguate the labeling. This linear ordering has been used to visualize higher order 
correlations in the network topology [H, [l3| . We observed that using a degree-only ranking scheme or a linear order 
ranking has little effect on the statistical properties under consideration here. So in this communication we consider 
only a degree-dependent ranking scheme. 

Figure [1] shows $(r) for the Scientists co-authorship f Scientists') flp - the protein interaction (Protein) [3], the 
AS-Internet (Internet) [13] and the power grid networks (Power) jlSj ]. We selected these networks as they have 
been widely used when studying the topological properties of complex network as they have different topological 
characteristics (e.g. [l^). These networks have very different connectivities between the high degree nodes. The 
Internet data has a fully connected core where the top seven nodes are fully connected ($(7) — 1) in contrast with 
the power grid network where the top fourteen nodes do not share a link at all ($(14) = 0). 
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FIG. 1. The rich-club coefficient "3?(r) for the four networks under study. 



II. NETWORKS DEFINED BY THE RICH-CLUB COEFFICIENT 



From equation ([T]) we have that the number of links that node r shares with the r — 1 nodes of higher or equal 
degree is 

A£:(r) = E{r) - E{r - 1) = $(r)^^^^^^ - $(r - (3) 

where Ai?(l) — 0, X^iLi ^E{i) = L where L is the total number of links and N is the total number of nodes. The term 
AE{1) is zero, because the node r = 1 is the top node. The number of links AE{r) is bounded by < AE{r) < r — 1. 
If AE{r) is known then for r > 2, <i>(r) is obtained from the recursive equation 

^ 2AE{r) + {r-l){r-2Mr-l) 
r(r — 1) 

with $(1) = 0. 

To determine the network connectivity we need to distribute AE{r) links between node r and the r' G [l,r — 1] 
nodes, for all r. Let us assume that P{r',r) is the probability that node r connects to node r' < r, P{r,r) = as 
self-loops are not allowed and two nodes can share only one link. Given AE{r) links for 1 < r < A^, we constrain the 
connectivity of a network by imposing the condition that the average number of links, AE{r), between node r and 
the r — 1 nodes of lower rank is 

r-l 

'AE{r) P{i,r) = AE{r). (5) 
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This last equation defines an ensemble of networks where their average rich-club coefRcient i'(r) is defined by equa- 
tions ([U and ([5]). A network from this ensemble satisfies that the average degree of node r is 

N N 
r' — 1 i— T^+l 

with standard deviation (t| ~ X]!^=i ^(^'i ^)(1 ^ ^))- The average number of links is Z = P(r', r) with 

standard deviation (t| = ^j,, P(r', r)(l — P{r' ,r)) and the average degree of the nearest-neighbors [l^l of a node 
with degree k is 




In the above equation, the Kronecker delta is introduce to consider only nodes with degree equal to fc, Nk is the 
number of fc-degree nodes and the term 1/fc is a normalization factor. 



A. Local linking term T(r' ,r) 



As an ansatz we assume that the probability that there is a link between node r and r' can be factorized as 

P(/,r) = T(/,r) A£;(r), r' < r, (8) 

where T(r', r) is a local linking factor. From equation ([5]) we have that X]r'=i ^('''j — 1- li^rc the sample space 
is the set of all possible combination of links that the node has with the r — 1 nodes of lower rank, with the restriction 
that only one link is shared between two nodes. The set of events, used in the definition of the probability, is the 
different combination that A£'(r) links can be shared between node r and the r — 1 lower rank nodes. Notice that 
A£'(r) is bound by < A£'(r) < r - 1. 



1. Egalitarian linking 

The simplest case is when the AE{r) links that node r can share with the r — 1 nodes are evenly distributed, then 
the probability that node r connects to r' is 

P{r',r) = -^—AE(r), r' < r, (9) 

r — 1 

where T(r', r) = — 1). For example if node r has a link with all the r — 1 nodes of lower rank, i.e. AE(r) — r — I 
then P{r',r) = 1, r' < r. 



2. Preferential linking 

For the case that node r prefers to connect to nodes with lower rank (i.e. higher degree), we propose the preferential 
linking term T(r', r) = r'^" / S{r), where a > is a constant and S{r) is a normalization factor. The probability that 
there is a link between r and r' is 

Pir', r) = ^Ai?(r) = ( ) Ai?(r), (10) 

where S{r) — * " to ensure that X]i=i Pih^) ~ AE(r) (see equation[S|). Note that it is possible that for some 

networks the probability function given by equation (jlOp may be larger than 1, in that case a more suitable T{r',r) 
should be considered. 
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(a) (b) (c) 



FIG. 2. Topological properties of an egalitarian rich-club network constructed from the Scientists co-authorship network. 



B. Evaluation of the model 

Figure [Ija) shows the rich-club coefficient $(r) as a function of node rank r for the Scientists network. From the 
data we evaluated ^(r) and AE{r). To build a model of the Scientists network we assume that the local linking term 
is egalitarian and evaluated P(r, r'), kr, and fcnn(fc) using equations ([9]) , ([6]) and (O. To obtain an integer value of 
the node's degree we used the integer function \kr\. The degree distribution P(fc) was obtained using \kr\ for all r. 
Figures [2{a) to (c) show that our model resembles the real network, except that the top ranked nodes have similar 
degrees, which is not true in the real network (see r < 20 in Figure HJa)). In fact these nodes have significantly 
different degrees, which suggests that there is a preferential linking between the high degree nodes. To verify this last 
statement we created a network model using the preferential linking defined in equation (jlOp. The exponent a was 
obtained by fitting the average degree of node kr using equation 1^ against the degree kr from the original network, 
we did so by obtaining the value of a that minimizes the square of the error ^^{kr — krY' to a precision of 10~^ in 
a. Figures |3l[a)-(c) show that a model based on the preferential linking is a good approximation of the Scientists 
network. 

We also created a network model based on preferential linking for the Protein interaction network J^], the AS- 
Internet and the Power grid network [l^. Figures |3{d)-(l) demonstrate that our models closely resemble the 
degree distribution and give a good approximation to the nearest-neighbors average degree for both assortative (co- 
authorship) and disassortative (Internet, Protein and Power grid) networks. 



III. GENERATING RANDOM NETWORKS WITH GIVEN RICH-CLUB COEFFICIENT 

For a given network we can generate a surrogate random network which preserves the total number of nodes N ^ 
links L, and the rich-club coefficient $(r) for all r. First the N nodes are ranked by their degree and the number 
of links Ai?(r) is evaluated from <I>(r). To create a surrogate network, we create N stub nodes by assigning to each 
node r € [2,iV], AE{r) links. From r = 2 to iV, we connect the AE{r) links between the stub node r and the 
smaller-ranked r' < r nodes. The probability that node r connects to node r' is defined by equation ([5]). We do not 
allow self-links or that two nodes share more than one link. Notice that J2r=i ^E{r) = L. AU of these surrogate 
networks will have the same rich-club coefficient as the original network because all of them have the same AE(r) 
for all r. 

As an example, we illustrate the technique using the Scientists co-authorship collaboration network using prefer- 
ential linking. Figure IHa)-(b) show the average node degree as a function of node rank, obtained from 40 surrogate 
random networks generated using the preferential linking with a — 0.28. The average degree of the random networks 
closely (lines with error bars) approximates the degree of the original network (open circles). In the figure we also 
plotted the average degree obtained from equation ([6]) (filled diamonds). Notice that we expect a discrepancy between 
the average degree evaluated from the random networks and the average degree obtained from equation Q. The 
reason is that the restriction imposed by equation ^ means that the average number of links between node r and the 
r' < r nodes is AE{r). However, the random surrogate networks are generated with the restriction that the number 




FIG. 3. The node's degrees kr, their degree distribution P{k) and average degree of the nearest-neighbors fcnn for the four 
networks under study. The properties of the original network are show in black and their models in gray. For these a values, 
using linear regression of {kr — k,.) vs. r, the regression coefficient R of the networks is between 0.85 and 0.87. 
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FIG. 4. (a) Degree kr of the 20 smallest ranked nodes (i.e. 1 < r < 20), and (b) for 6000 < r < 6020. The open circle is 
the original data, the filled diamonds is the average obtained from equation ([6]) and the line with the error bars obtained from 
40 realizations of the random networks, (c) The nearest-neighbors average degree fcnn(fc) for the 40 random networks (dots) 
compared against the value of the original network (solid line). 



of links between node r and the r' < r nodes is exactly AE{r). 

Figure |3^c) compares the nearest-neighbors average degree fcnn(fc) of the original network (line) against the value 
obtained by the 40 random networks (scatter dots) . It is clear that the random networks resemble well the assortative 
property of the real network. 



IV. CONCLUSION 



We investigated the properties of random networks defined by a rank-based rich-club coefficient. We show that 
using a preferential local linking term, we can approximate the degree distribution and the mixing pattern of a variety 
of real networks. We also introduced a method to generate surrogate random networks conserving the network's 
rich-club coefficient. While existing surrogate network models have focused on connectivity of individual nodes, our 
work provides an alternative method based on the hierarchical structure of a network in terms of link density among 
a group of nodes. 
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